Myanmar Number Normalization for Text-to-Speech
نویسندگان
چکیده
--Text Normalization is an essential module for Text-to-Speech (TTS) system as TTS systems need to work on real text. This paper describes Myanmar number normalization designed for Myanmar Text-to-Speech system. Semiotic classes forMyanmar language are identified by the study of Myanmar text corpus and Weighted Finite State Transducers (WFST) based Myanmar number normalization is implemented. Number suffixes and prefixes are also applied for token classification and finally, postprocessing has been done for tokens that cannot be classified. This approach achieves average tag accuracy of 93.5% for classification phase and average Word Error Rate (WER) 0.95% for overall performance which is 5.65% lower than rule-based system. The results show that this approach can be used in Myanmar TTS system and to our knowledge, this is the first published work of Myanmar number normalization system designed for Myanmar TTS system. Keywords-Myanmar Number Normalization; Text Normalization; Weighted Finite State Transducer; Myanmar Text-to-Speech; Myanmar;
منابع مشابه
Multilingual number transcription for text-to-speech conversion
This paper describes the text normalization module of a text to speech fully-trainable conversion system and its application to number transcription. The main target is to generate a language independent text normalization module, based on data instead of on expert rules. This paper proposes a general architecture based on statistical machine translation techniques. This proposal is composed of...
متن کاملIdentification of Adopted Pali Words in Myanmar Text
Myanmar language has been significantly influenced by Pali language due to the practice of Buddhism and study of Buddhist literature in Myanmar. As a result, Pali words have been widely adopted and used in Myanmar language. This study presents an algorithm for identifying Myanmar-adopted Pali words in Myanmar text. The system employs a combination of rule-based syllable segmentation and a dicti...
متن کاملAn Expanded Taxonomy of Semiotic Classes for Text Normalization
We describe an expanded taxonomy of semiotic classes for text normalization, building upon the work in [1]. We add a large number of categories of non-standard words (NSWs) that we believe a robust real-world text normalization system will have to be able to process. Our new categories are based upon empirical findings encountered while building text normalization systems across many languages,...
متن کاملHMM based myanmar text to speech system
This paper presents a complete statistical speech synthesizer for Myanmar which includes a syllable segmenter, text normalizer, grapheme-to-phoneme convertor, and an HMM-based speech synthesis engine. We believe this is the first such system for the Myanmar language. We performed a thorough human evaluation of the synthesizer relative to human and re-synthesized baselines. Our results show that...
متن کاملText Normalization and Unit Selection for a Memory Based Non Uniform Unit Selection TTS in Malayalam
Text to speech synthesis system intended for any language, converts the given text in that language to corresponding speech. The major challenge in TTS system is to generate artificial speech which appears to be natural and intelligible. This is essential for visually impaired people to properly understand and comprehend the generated speech. This paper discuss about text normalization and unit...
متن کامل